01:36
2026-06-07
arxiv.org
large-language-models
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
Researchers introduced Gaia2, a benchmark for evaluating large language model agents in dynamic, asynchronous environments where scenarios evolve independently of agent actions. Testing of state-of-thβ¦